Scenarios for the Altamira cave CO2 concentration from 1950 to 2100

A data-driven approach insensitive to the initial conditions was developed to extract governing equations for the concentration of CO2 in the Altamira cave (Spain) and its two main drivers: the outside temperature and the soil moisture. This model was then reformulated in order to use satellite observations and meteorological predictions, as a forcing. The concentration of CO2 inside the cave was then investigated from 1950 to 2100 under various scenarios. It is found that extreme levels of CO2 were reached during the period 1950–1972 due to the massive affluence of visitors. It is demonstrated that it is possible to monitor the CO2 in the cave in real time using satellite information as an external forcing. For the future, it is shown that the maximum values of CO2 will exceed the levels reached during the 1980s and the 1990s when the CO2 introduced by the touristic visits, although intentionally reduced, still enhanced considerably the micro corrosion of walls and pigments.


Data
The detailed description of the observational time series used in the study is provided in the following paragraphs.Their main characteristics are summarized in Supplementary Table 1.
1.1 In situ data Several in situ data sets were considered in the study, including variables measured inside (CO 2 concentration) and outside the cave (air temperature and volumetric water content in the soil).The positions of the sensors are shown in Fig. 1B.The original time series used in the analysis are shown in Supplementary Fig. 1, as well as the effect of their preprocessing.Depending on the context, the time series were used for modelling and/or validation.Their detailed description is given hereafter.
A micro-environmental station recorded the CO 2 concentration and the microclimatic data [72] inside the Polychromes Hall.It consisted of an 8-channel, 16-bit datalogger (COMBILOG TF 1020, Theodor Fiedrich & Co., Germany) with a suite of probes for CO 2 concentration (ITR 498 ADOS, Germany).The time series of CO 2 concentration (in ppm) was measured in the Polychromes Hall air during the period 28 March 2007 to 27 September 2012.Different sampling time were used since 2007 until 28 April 2009 (5 minutes), until 01 October 2009 (10 minutes), until 25 November 2009 (30 minutes), and until 24 November 2010 (60 minutes) and for the rest of the time series (15 minutes).
Complementary data sets of CO 2 concentration were also used, mainly for validation.From 23 July 2004 to 13 September 2005, records of CO 2 concentration were performed with a 15minute sampling in the Polychromes hall [19][20].Still in the same hall, records of CO 2 concentration were performed with a 60-minute sampling from 31 January 1997 to 07 August 1998 [15].
A weather station located on the hill above the cave, with two autonomous dataloggers, stored the air temperature (HOBO U23 Pro v2, Onset, Bourne, MA, USA), and the soil VWC at 5 cm depth (HOBO U12, Onset, Bourne, MA, USA, equipped with an ECHO EC-10 of Decagon Devices).The time series of the outside temperature (in Celsius) was measured during the period 28 March 2007 to 27 September 2012 (15 minutes sampling time).The time series of the Volumetric Water Content (VWC) in the upper layer of soil lying on the surface over the cave was measured during the period 20 June 2007 to 27 September 2012.This variable is expressed as fraction of the total volume (in m 3 /m 3 ), therefore it cannot surpass the soil porosity, which is, according to laboratory characterizations, 0.48 m 3 /m 3 [21].The sampling time was 30 minutes until 03 February 2010, 60 minutes until 24 November 2010 and 30 minutes for the rest of the time series.
In general, the observed time series are characterized by a short time scale superimposed to the seasonal time scale signal.A low pass filter was applied to the time series of CO 2 concentration, the outside temperature and the VWC, since this study is focused on modelling the seasonal scale.The time series were smoothed using a local weighted regression [73] with a smoothing parameter α of 0.1 in all cases (except the CO 2 time series of 1997-98, for which the smoothing parameter was 0.25).The smoothed time series were re-sampled to daily values by averaging the values.The gaps in the resulting time series were moderate in size, therefore they were filled using cubic spline interpolation following previous analyses using the global modelling technique [33].However, the time series of CO 2 of 2007-2012 had three gaps longer than two months and smaller gaps as well.Only the small gaps of this time series were filled, whereas the long ones were kept as they were initially (see Supplementary Fig. 1C).
The time series of the soil VWC shows an inter-annual decrease as a consequence of an artificial drift, which is commonly observed with the type of probe used to collect the data, and needs to be corrected [74].To estimate and remove the drift, a moving average was calculated from 2009 to 2011 with a one-year sliding window and the result was smoothed using the local weighted regression.When the drift starts, its value must be zero; therefore, we subtracted from the filtered moving average its first value.Finally, the estimated drift was subtracted from the VWC time series.
1.2 Soil Moisture -The SMOS satellite data SMOS satellite is the first mission dedicated to the measurements of surface soil moisture from space.Initiated by the end of the 1990s, it was launched on 02 November 2009, and started to collect scientific data of soil moisture on 17 January 2010 [50].As the first months were dedicated to the commissioning phase, data are available since May 2010 until now.
Starting from the SMOS level 2 product, a specific processing, more adapted than the SMOS level 3 product commonly used, was considered in order to take the location of the site of study (~4.5 km from the ocean) and the large footprint of the SMOS measurements [75] into account.This processing enabled to keep the maximum of valid measurements in the crossing zone between ocean and continent at the closer vicinity to the studied site.This product was generated on a 15 km resolution grid and corresponds to version 700.Three time series were generated.These were compared to the in situ time series.Only the one enabling the best agreement with the in situ time series was kept.The selected time series is shown in Supplementary Fig. 4 and its area of selection on Fig. 1D.
The following processing was then applied to the time series.Initially irregular, the sampling time was homogenized to one value every 12 hours.Note that, as for any other forcing time series, its sampling time is double than that of the CO 2 time series because it is required by the Runge-Kutta method, which was chosen for numerical integration.In the first place, the original time series were interpolated to one value per second by means of the Stineman's interpolation [76].Then the values were aggregated using the median on 12-hour windows: 14:00-01:59 and 02:00-13.59,centered at 8.00 and 20:00 (because Altamira in situ time series are centered at 20:00 too).The resulting time series were smoothed using a LOESS smoothing with parameter α equal to 0.05.Because the means of the satellite and the in situ time series were quite different (0.251 and 0.367 m 3 /m 3 , respectively), the satellite time series were re-calibrated.First, the mean of the SMOS time series was modified to match the mean of the in situ time series by adding a constant (0.115846).Then, a dumping function was used on the corrected SMOS time series to smooth excessively large maxima.The R 2 coefficient between the in situ and the preprocessed satellite data was of 78%.The time series at the closest vicinity to the site of study was also the one in with best agreement with the in situ observations.Only this one was considered for the analyses.The ESA CCI SM v4 product was used in the present study.This data set is exclusively based on satellite data.It merges active, passive, and combined active-passive products.These are sampled to daily images on a regular grid of 0.25° space resolution, at global scale.Six grid points were tested in the study.The quality of the extracted time series highly depends on the availability of the original data sets at the site of study.At the closest grid point here considered, data were particularly scarce up to 2012, therefore the next closest grid point was used instead (see Fig. 1D), from September 2003 (to 30 December 2012).
This time series was then re-sampled at a 12-hour sampling time using the Stinemann's interpolation.Due to important differences of means in comparison to the in situ time series, a recalibration was applied (see Supplementary Fig. 5  This data set provides a gridded product of 0.5° resolution with a daily sampling time.Time series of soil moisture (0-10 cm depth) were extracted at the closest grid points (see Fig. 1D).The two time series on the coastal area were directly rejected as inconsistent and a single time series was kept for the analysis (see Supplementary Fig. 6).
The selected time series was first re-sampled at a 12-hour sampling time using cubic splines.As for the other data sets, the means of this time series and the in situ time series being significantly different (0.255 and 0.367 m 3 /m 3 , respectively), the time series was recalibrated following the procedure described in methods Section.

Land Surface Temperature -The MODIS data set
The brightness temperature measured from satellite in the optic bands can be used to estimate land surface temperature.
The longest satellite data set presently available for Land Surface Temperature is the MODIS11-C3 v6 product, provided by the National Aeronautics and Space Administration (NASA) with a monthly sampling time on a 0.05° grid, and available from 15 February 2000 to 15 June 2021.The time series (see Supplementary Fig. 7) located at the closest vicinity to the cave of Altamira was extracted (see Fig. 1D).
This time series was up-sampled to one value every 12 hours, by means of cubic splines.As for soil moisture time series, the sampling time was doubled in comparison to the CO 2 time series (12-hour vs. 24-hour) as required by the Runge-Kutta method.To ensure the consistency with the processing applied to the other variables, a LOESS smoothing [73] was then applied to the up-sampled time series, with a smoothing parameter of 0.05.A recalibration, whose description is provided in methods Section, was also applied.

Land Surface Temperature -The ECMWF model
For more remote periods (no appropriate LST data from satellite was available before 2000), re-analyses from the European Centre for Medium-range Weather Forecasts (ECMWF) were used.
The ERA5 product was chosen.This is a gridded product of hourly sampling time and 0.1° space resolution available from 1950 to 2020.The grid pixel considered in the study is located at (43.15°N, 355.85°E) as shown in Fig. 1D.
A time series of one measurement every 12h (at 8:00 and 20:00) was retrieved and smoothed with smoothing parameter 0.008 for temperature and 0.006 for VWC (Supplementary Figs. 8 and 9).A recalibration, whose description is provided in methods Section, was also applied.

The IPCC climatic scenarios
For running scenarios up to the end of the 21 st century, simulations of the Intergouvernmental Panel on Climate Change (IPCC) were used.These simulations are based on a General Circulation Model of the European Centre for Medium-Range Weather Forecasts.Its detailed description is given in the Special Report on the Emissions Scenarios (SRES) published by the Intergouvernmental Panel on Climate Change (IPCC) in 2000 [77].The considered scenarios were all run from 1990 based on observed conditions originally gathered for the time period 1860-1990 [55].Originally run with a 24-minute time step [78], outputs are provided as a gridded product of monthly sampling time.To run model M C ** , four IPCC scenarios were considered: The ECHAM4 A2 assumes a very heterogeneous world, largely based on self-reliance, with a continuation of the local governing modes.It is characterized by a continuous increasing of the populations associated with slow economic growth and slow technological changes.Time series of surface temperature and soil moisture simulated under this scenario are presented in Supplementary Figs. 10 and 11.At the site of study, considering a sliding window of 5 years, it is estimated that the increase of temperature reaches +3.5°C by 2100.The behaviour is however much more complex since it exhibits both higher maxima and lower minima.The grid point the closest to Altamira coordinates, located at (44,269°N, 355,718°E) was used (see Fig. 1A).It has latitudinal and longitudinal resolutions of 2.8° and 2.8°, respectively.
The ECHAM4 B2 scenario assumes a lower rate of population growth (in comparison to A2) in a world committed to local solutions for economic, social and environmental sustainability.It is estimated by the corresponding simulation that this scenario, now considered as unreachable, would have enabled a moderate increase of temperature on average (+2.5°C by 2100) with relatively moderate evolutions in the distribution of extreme values (see Supplementary Figs. 12 and 13).The grid point is the same as for scenario A2.
The ECHAM4 A1B assumes the world population will peak at mid-century and then, it will decrease.It assumes also quick economic growth with rapid introduction of more efficient new technologies and balanced use of multiple energy sources.The grid point is the same as for scenarios A2 and B2.This scenario would lead to an increase of temperature of +3°C on average by 2100, with no significant changes in the distribution of extreme values (see Supplementary Figs. 14 and 15).
The ECHAM4 B1 assumes the same population evolution as A1B.Scenario B1 assumes that clean and efficient technologies will be introduced as well as global solutions leading to sustainability and equity.The grid point is located on the Iberian peninsula and characterized by a very low resolution: 5.6° and 3.2° latitudinal and longitudinal resolutions, respectively.This scenario would lead to a very moderate increase of temperature on average (+2°C by 2100) with no significant changes in the distribution of extreme values (see Supplementary Figs.16 and 17).

The number of visitors
The number of visitors has strongly varied from the early 1950s to the present.Annual data from 1952 were provided by the Museum of Altamira (Supplementary Table 11).Six main periods can be distinguished: an increasing number of visitors from moderately high in 1952 to extremely high in 1974, a quick decrease from 1975 to 1977, no visit from 1978 to 1981, a low controlled level of visits from 1982 to 2002, no visits from 2003 to 2013 and an even lower number of controlled visits from 2013 to the present.The schedule of the visits from 1952 to 1977 is unknown.In the period 1982-2002 there were five groups in the morning, consisting each one in 5 tourists plus a guide; the cave was closed on Mondays.From 2014, there is a single group of 5 tourists plus 2 guides every Friday.

Soil moisture products recalibration
Notable differences in the statistical characteristics (means and variance) are obviously observed between the considered products.This is expected since (i) they have very different spatial resolution (from very local for in situ data to very large resolution for the combined products); (ii) their construction is based on different physical (the response to a local electromagnetic field for in situ measurements, the natural emission of passive microwave for SMOS, etc.) and statistical processes (a combination of heterogeneous data sets for ESA CCI SM scaled using the model GLDAS, and ESM products); and (iii) the estimates are observed/reconstructed at different soil depths.
To make the comparison possible between these different data sets, it was chosen to recalibrate the satellite and hybrid time series, taking the in situ measurements as a reference.The in situ time series presenting a short overlap with some data sets, the following strategy was chosen: First, the data set presenting the longer time span (the ESM product) was calibrated on the in situ time series.Then, the other products (CCI, ECHAM4) were calibrated either on the calibrated ESM time series or on the in situ time series (in each case, the best fit was chosen).In all the cases, a linear fitting was applied enabling to calibrate at the same time the means and the variance of the time series.The data for the IPCC scenarios A1B and B1 present some calibration problems (too high p-value for the intercept and poor r-square, respectively) and, as a consequence, they cannot be considered as realiable as the A2 and B2 scenarios.The results of this inter-calibration are summarized in Supplementary Table 2.

Temperature products recalibration
The temperature products do also present significantly different statistical means and variances in comparison to the in situ records.Here also, such differences are expected due to very different spatial resolution (from very local for in situ data to large and very large for satellite and models resolution), and methodologies (direct versus indirect measurements or physical modelling).The overlapping of the in situ recording with the three other data sets is complete, therefore it was used as direct reference in all cases.The results of the calibration of the temperature time series are summarized in Supplementary Table 3.

Supplementary Note 3 -Obtained Models
Model U C : The model U C obtained for the concentration of CO 2 from a single time series C obs (t) is based on a canonical structure (Eqs.5).It is composed of N p = 13 terms, including 11 ones in the polynomial of which parameter values are provided in Supplementary Table 5.The model enables a consistent reconstruction of the phase space, both visually, and in terms of range (see Fig. 2).
Error growth revealed very comparable results on both the modelling and independent windows, providing a robust argument of validation (Supplementary Table 8).This dynamic is characterized by a relatively low predictability (horizon of predictability H P of ~30 days for a maximum error of 5% with a 90% confidence level, see Supplementary Fig. 3A).This low predictability is very likely due to the multiple external influences acting on the cave atmosphere dynamics onto which the cave has weak dynamical feedback only.
The obtained model exhibits all the properties of chaos gathered in a consistent object.Determinismwhich is a necessary condition for chaos that has to be demonstrated when starting from observational time series [42]is ensured by the equations of ordinary differential structure.All the other properties of chaos (see Supplementary Table 4) are also retrieved in the attractor obtained from these equations: positive first Lyapunov exponent (λ 1 = +8.8•10 - ±1.2•10 -5 ), fractal dimension (D KY = 2.09 ±0.02), and presence of folding in the model flow, whose first return map is characterized by a multi-modal structure containing at least four monotonic branches (see Fig. 2C).

Model U T :
The model U T here obtained for the outside temperature from a single time series T obs (t) is also based on a canonical structure (Eqs.5).It is composed of N p = 30 terms, including 28 ones in the polynomial (S2) of which parameter values are provided in Supplementary Table 6.The model enables a consistent reconstruction of the phase space, both visually, and in terms of range (Fig. 2).Here also, the error growth revealed comparable results on both the modelling and independent windows, providing a strong argument of validation.In comparison to the concentration of CO 2 , this dynamic is characterized by excellent predictability at seasonal scale (horizon of predictability H P of ~120 days for a maximum error of 5% with a 90% confidence level, see Supplementary Fig. 3B).
This model also exhibits all the properties of chaos (Supplementary Table 4), all gathered in a consistent object: determinism is ensured by the equations of ordinary differential structure, positive first Lyapunov exponent (λ 1 = +9.7•10 - ±2.2•10 -4 ), fractal dimension (D KY = 2.13 ± 0.03), and presence of folding in the model flow, characterized by a multi-modal structure of the first return map with three monotonic branches (see Fig. 2C).

Model U W :
The model U W obtained here for the soil water content from a single time series W obs (t) is also based on a canonical structure (Eqs.5).It is composed of N p = 27 terms, including 25 ones in the polynomial of which parameter values are provided in Supplementary Table 8.The model enables a consistent reconstruction of the phase space, both visually, and in terms of range (Fig. 2).Here again, the error growth revealed comparable results on both the modelling and independent window, providing a strong argument for validation.As for the external temperature, its dynamics is characterized by an excellent predictability at seasonal scale (horizon of predictability H P of ~110 days for a maximum error of 5% with a 90% confidence level, see Supplementary Fig. 3C).
Here again, the model exhibits all the properties of chaos (Supplementary Table 4) gathered in a consistent object: determinism is ensured by the equations of ordinary differential structure, first Lyapunov exponent is positive (λ 1 = +1.4•10 - ±1.0•10 -4 ), fractal dimension (D KY = 2.39 ± 0.02), and presence of folding in the model flow, characterized by a multi-modal structure of the first return map with several monotonic branches (see Fig. 2C).

Model M CTW :
This multivariate model M CTW (Eqs.15-20) is six-dimensional since it involves C the concentration of CO 2 in the cave, T, T  (=T 2 ) and T   (=T 3 ) the external temperature and its two first derivatives, and W, W  (=W 2 ) the volumetric water content above the cave and its first derivative.Parameter values are provided in Supplementary Table 9. Eqs.(16-18) correspond to the univariate model U T obtained for the outside temperature, already presented higher in the text (with polynomial U T defined in Eq.S2).The other two polynomials are defined as Several projections of the model phase space are shown in Fig. 3B.The temperature forcing generates a chaotic behaviour (as demonstrated by the two-branch first return map Fig. 3C) on the concentration of CO 2 (and on the soil moisture also).The dynamics is moderately developed in comparison to the observations (maximum temperature values are in the range [17.4; 18.9] ºC for the model, versus [16.5; 18.5] ºC for the observations).Despite this, it shows that the temperature can, not only generate chaotic oscillations on the cave micro-atmosphere, but also synchronize them on its pseudo-periodic behaviour.Moreover, the complexity of the transient (not shown) revealed that temporarily perturbed conditions may give rise to more complexity on both the concentration of CO 2 and the soil moisture, with some temporary de-synchronization which is very consistent with the observed phase portrait.
The analysis of the error growth confirmed the low predictability of the concentration of CO 2 at seasonal scale (horizon of predictability H P of ~10 days for a maximum error of 5% with a 90% confidence level, see Supplementary Fig. 3D) as already observed with the model U C .
The main properties of chaos were also retrieved (Supplementary Table 4): determinism is ensured by the equations of ordinary differential structure, and presence of folding in the model flow characterized by a multi-modal structure of the first return map with two monotonic branches (see Fig. 3C).1).The model enables a very consistent reconstruction of the phase space, both visually, and in terms of range (Fig. 4B).

Models
As described in the main text, additional forcings were added to the model M C * in order to account for the variations of the CO 2 level in the outside from 1950 to 2100 (model M C ** , Eq. 3) and for the CO 2 introduced by the visitors' breath (model M C *** , Eq. 4).The Global Polynomial Modelling (GPoM) algorithm aims to obtain governing equations (or a good approximation of them) for a studied system, directly from observational time series.The algorithm requires a set of time series providing a proper description of the observed system and several parameters: the window length to be used to compute the derivatives of the input time series, the number of variable n x , n y , etc. to be derived from each time series (e.g. for n x = 3, the original time x(t) will be used together with its two first derivatives dx/dt and d 2 /dt 2 ); the maximum polynomial degree q max allowed in the numeric formulation, optionally some specific constraints to be applied to the equations (e.g.max number of parameters, forbidden couplings, etc.), and the maximum integration time steps N max .In output, the models are separated into three groups: (i) rejected due to NaN or too far from the original data (>4•σ.);(ii) classified as fixed point, period-1, period-2, etc.; and (iii) unclassified (and thus requiring further analyses to be classified properly).3.  2. simulations of the ECMWF model, before and after smoothing (A); time series before and after calibration (in dark blue and brown, respectively) with the in situ observations also reported in grey (B).The calibration parameters are provided in Supplementary Table 3.                      3.

1. 3
Soil Moisture -The ESA CCI SM product The European Space Agency's Climate Change Initiative for Soil Moisture (ESA CCI SM) has developed a merging algorithm to generate long-term (1978-2020) data for soil moisture [52].

Fig. 1
Fig.1In situ time series.Original (in gray) and pre-processed (in black) time series of the outside temperature (A) (temperature measured in the Polychrome hall is also provided in blue), of the volumetric water content in the soil at the vertical of the cave (B) (measurements before drift correction is provided in blue), and of CO 2 concentration observed in situ in the polychrome hall (C).

Fig. 2
Fig. 2 General algorithm for obtaining dynamical equations from time series.

Fig. 3
Fig. 3 Models error growth.The error growth of models U C , U T , U W and M CTW is shown in (A)to (D), respectively.For the multivariate model M CTW , only the error growth of variable C is provided since the dynamics of variables W and T strictly relies on models U W and U T for which validation is already provided in (A) to (C).All the particular simulations of the ensemble are shown in thin lines for both the modelling window (in pink) and the validation window (in green).The 90%-percentile is also provided for both ensembles, in thick lines (red and dark green, respectively).

Fig. 4
Fig. 4 Time series of volumetric water content from SMOS satellite.Original observations in black and re-sampled time series in blue (A); smoothed time series before calibration in dark blue (B), time series before and after calibration (blue and brown lines, respectively) where the in situ observations are also reported in grey (C).The calibration parameters are provided in Supplementary Table2.
Fig. 4 Time series of volumetric water content from SMOS satellite.Original observations in black and re-sampled time series in blue (A); smoothed time series before calibration in dark blue (B), time series before and after calibration (blue and brown lines, respectively) where the in situ observations are also reported in grey (C).The calibration parameters are provided in Supplementary Table2.

Fig. 5
Fig. 5 Time series of volumetric water content from multisatellite product.Original observations of the ESA CCI SM product in black and re-sampled time series in blue (A); smoothed time series before calibration in dark blue (B), time series before and after calibration (blue and brown lines, respectively), where the in situ observations are also reported in grey and the ESM time series, used for the intercalibration, in green (C).The calibration parameters are provided in Supplementary Table2.
Fig. 5 Time series of volumetric water content from multisatellite product.Original observations of the ESA CCI SM product in black and re-sampled time series in blue (A); smoothed time series before calibration in dark blue (B), time series before and after calibration (blue and brown lines, respectively), where the in situ observations are also reported in grey and the ESM time series, used for the intercalibration, in green (C).The calibration parameters are provided in Supplementary Table2.
Fig. 5 Time series of volumetric water content from multisatellite product.Original observations of the ESA CCI SM product in black and re-sampled time series in blue (A); smoothed time series before calibration in dark blue (B), time series before and after calibration (blue and brown lines, respectively), where the in situ observations are also reported in grey and the ESM time series, used for the intercalibration, in green (C).The calibration parameters are provided in Supplementary Table2.

Fig. 6
Fig. 6 Time series of volumetric water content from hybrid product.Original observations of the ESM Global Multi-layer Soil Moisture Product (blue line), after calibration (brown line), with the in situ observations also reported (grey line).The calibration parameters are provided in Supplementary Table2.
Fig. 6 Time series of volumetric water content from hybrid product.Original observations of the ESM Global Multi-layer Soil Moisture Product (blue line), after calibration (brown line), with the in situ observations also reported (grey line).The calibration parameters are provided in Supplementary Table2.

Fig. 7 LST
Fig. 7 LST Satellite time series.Original data (black), re-sampled data (light blue) and smooth data (dark blue) (A); smooth data before calibration (dark blue), after calibration (brown) and observational time series used for calibration (grey) (B).The calibration parameters are provided in Supplementary Table3.

Fig. 8
Fig. 8 Time series of volumetric water content from the ECMWF model.Original product of the ERA5 simulations of the ECMWF model, before and after smoothing (A); time series before and after calibration (in dark blue and brown, respectively) with the in situ observations also reported in grey (B).The calibration parameters are provided in Supplementary Table2.

Fig. 9
Fig. 9 Time series of temperature from the ECMWF model.Original product of the ERA5 simulations of the ECMWF model, before and after smoothing (A); time series before and after calibration (in dark blue and brown, respectively) with the in situ observations also reported in grey (B).The calibration parameters are provided in Supplementary Table3.

Fig. 10
Fig. 10 Time series of volumetric water content from the IPCC A2 model.Original product of the IPCC A2 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table2.
Fig. 10 Time series of volumetric water content from the IPCC A2 model.Original product of the IPCC A2 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table2.
Fig. 10 Time series of volumetric water content from the IPCC A2 model.Original product of the IPCC A2 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table2.

Fig. 11
Fig. 11 Time series of temperature from the IPCC A2 model.Original product of the IPCC A2 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table3.
Fig. 11 Time series of temperature from the IPCC A2 model.Original product of the IPCC A2 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table3.
Fig. 11 Time series of temperature from the IPCC A2 model.Original product of the IPCC A2 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table3.

Fig. 12
Fig. 12 Time series of volumetric water content from the IPCC B2 model.Original product of the IPCC B2 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table2.
Fig. 12 Time series of volumetric water content from the IPCC B2 model.Original product of the IPCC B2 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table2.
Fig. 12 Time series of volumetric water content from the IPCC B2 model.Original product of the IPCC B2 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table2.

Fig. 13
Fig. 13 Time series of temperature from the IPCC B2 model.Original product of the IPCC B2 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table3.
Fig. 13 Time series of temperature from the IPCC B2 model.Original product of the IPCC B2 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table3.
Fig. 13 Time series of temperature from the IPCC B2 model.Original product of the IPCC B2 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table3.

Fig. 14
Fig. 14 Time series of volumetric water content from the IPCC A1B model.Original product of the IPCC A1B model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table2.
Fig. 14 Time series of volumetric water content from the IPCC A1B model.Original product of the IPCC A1B model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table2.

Fig. 15
Fig. 15 Time series of temperature from the IPCC A1B model.Original product of the IPCC A1B model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table3.
Fig. 15 Time series of temperature from the IPCC A1B model.Original product of the IPCC A1B model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table3.
Fig. 15 Time series of temperature from the IPCC A1B model.Original product of the IPCC A1B model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table3.

Fig. 16
Fig. 16 Time series of volumetric water content from the IPCC B1 model.Original product of the IPCC B1 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table2.
Fig. 16 Time series of volumetric water content from the IPCC B1 model.Original product of the IPCC B1 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table2.
Fig. 16 Time series of volumetric water content from the IPCC B1 model.Original product of the IPCC B1 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table2.

Fig. 17
Fig. 17 Time series of temperature from the IPCC B1 model.Original product of the IPCC B1 model, before and after resampling (A); smoothed time series before calibration (B), time series before and after calibration in blue and brown, respectively, with the in situ observations also reported in grey (C).The calibration parameters are provided in Supplementary Table3.
CTW for the variations of the concentration of CO 2 (Eq.15) and replacing the variables T and W by an explicit forcing T obs (t) and W obs (t) ensured in practice by genuine observational time series of external temperature and soil water content, either observed in situ with T in situ (t) and W in situ (t), from satellite with T MODIS (t), W SMOS (t) and W CCI (t) or based on hybrid products with T ECMWF (t), T IPCC (t), W ESM (t) and W IPCC (t) (these are all synthesized in Supplementary Table

Table 1
Data sets considered in the study.Full details are provided in the text (see Supplementary Note 2). *

Table 2
Soil moisture time series inter-calibration.

Table 3
Temperature time series inter-calibration.

Table 4
Parameters tested in the modelling process and characteristics of the obtained models.
Multiple options were tried since this parameter can determine whether models can be found or not.×× Both unconstrainted and constrained algebraic structures were tried for the multivariate model (see methods section in the main text for further details).××× Due to the Runge-Kutta (RK4) method used to integrate the equations, a refined sampling time of the forcing time series was required in comparison to the other variables. ×

Table 5
Coefficients of polynomial P C (Eq. S1 in Supplementary Note 3) for model U C.

Table 6
Coefficients of polynomial P T (Eq.S2 in Supplementary Note 3) for model U T.

Table 7
Coefficients of polynomial P W (Eq. S3 in Supplementary Note 3) for model U W .

Table 8
Forecasting errors (90% percentile of the relative error of CO 2 concentration, temperature and Volumetric Water Content) in the identification and validation windows for different horizons of prediction.

Table 10
Forecasting errors (90% percentile of the relative error of CO 2 concentration in ppm) in the identification and validation windows for different horizons of prediction.